Code
library(pacman)
p_load(tidyverse, ggeffects, texreg, arm, broom.mixed,
ggthemes, interplot, HLMdiag, DHARMa, dotwhisker,
knitr, kableExtra, magrittr)We start as yesterday, by loading a few needed packages, and by defining a few custom functions we use below.
library(pacman)
p_load(tidyverse, ggeffects, texreg, arm, broom.mixed,
ggthemes, interplot, HLMdiag, DHARMa, dotwhisker,
knitr, kableExtra, magrittr)We also define here a few helpful functions we will rely on in this code file.
# Function from Zoltan Fazekas - it's a user-defined function, so it has to be
# re-loaded with exery new R session.
fun_two_SD <- function(x) {
(x - mean(x, na.rm = TRUE)) / (2 * sd(x, na.rm = TRUE))
}We’ll use the same data set as in the previous days. We’re keeping only the variables we’re interested in - compared to the previus days, I will add an additional level-2 predictor.
df_issp <- readRDS("../02-data/01-ISSP.rds")
df_issp %<>%
dplyr::select(cnt, year, country, poleff, female,
age10, educ, urban, incquart, ti_cpi,
gini10) %>%
na.omit()
df_issp %<>%
mutate(inc1 = case_when(incquart==1 ~ 1,
incquart %in% c(2,3,4) ~ 0),
inc2 = case_when(incquart==2 ~ 1,
incquart %in% c(1,3,4) ~ 0),
inc3 = case_when(incquart==3 ~ 1,
incquart %in% c(1,2,4) ~ 0),
inc4 = case_when(incquart==4 ~ 1,
incquart %in% c(1,2,3) ~ 0))Let’s go again through the centering procedures we went through yesterday, generate the last 3 models we looked at, and display the comparison table for the models.
df_issp %<>%
group_by(cnt) %>%
mutate(age10CWC = fun_two_SD(age10),
educCWC = fun_two_SD(educ),
femCWC = fun_two_SD(female),
urbanCWC = fun_two_SD(urban),
inc1CWC = fun_two_SD(inc1),
inc2CWC = fun_two_SD(inc2),
inc3CWC = fun_two_SD(inc3),
inc4CWC = fun_two_SD(inc4))df_agg <- df_issp %>%
group_by(cnt) %>%
summarise(ti_cpi = mean(ti_cpi, na.rm = TRUE),
gini10 = mean(gini10, na.rm = TRUE)) %>%
mutate(cpiCGM = fun_two_SD(ti_cpi),
gini10CGM = fun_two_SD(gini10)) %>%
dplyr::select(-ti_cpi, -gini10)df_issp <- left_join(df_issp, df_agg, by = c("cnt"))
rm(df_agg)We estimate 4 specifications, in increasing level of complexity, starting with the null model.
mlm.0 <- lmer(poleff ~ 1 + (1 | cnt),
data = df_issp,
control = lmerControl(optimizer = "bobyqa"))
mlm.1 <- lmer(poleff ~ 1 + age10CWC + femCWC + educCWC +
urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM +
(1 | cnt),
data = df_issp,
control = lmerControl(optimizer = "bobyqa"))
mlm.2 <- lmer(poleff ~ 1 + age10CWC + femCWC + educCWC +
urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM +
(1 + educCWC | cnt),
data = df_issp,
control = lmerControl(optimizer = "bobyqa"))
mlm.3 <- lmer(poleff ~ 1 + age10CWC + femCWC + educCWC +
urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM +
educCWC * cpiCGM + (1 + educCWC | cnt),
data = df_issp,
control = lmerControl(optimizer = "bobyqa"))As you could see in the previous days, the summary() function doesn’t give you any indication about the model fit. A set of dedicated functions exist for this.
logLik(mlm.1)'log Lik.' -39462.9 (df=11)
logLik(mlm.2)'log Lik.' -39362.13 (df=13)
logLik(mlm.3)'log Lik.' -39360.93 (df=14)
You also have functions for AIC and BIC (there’s not much sense to have a function for deviance, since it’s computed as \(-2 * logLik\)).
AIC(mlm.1)[1] 78947.8
AIC(mlm.2)[1] 78750.27
AIC(mlm.3)[1] 78749.86
BIC(mlm.1)[1] 79041.69
BIC(mlm.2)[1] 78861.23
BIC(mlm.3)[1] 78869.36
For a test of whether the differences in fit are statistically significant, we can turn to the anova() function.
anova(mlm.1, mlm.2, mlm.3)Data: df_issp
Models:
mlm.1: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + (1 | cnt)
mlm.2: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + (1 + educCWC | cnt)
mlm.3: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + educCWC * cpiCGM + (1 + educCWC | cnt)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
mlm.1 11 78885 78979 -39431 78863
mlm.2 13 78689 78800 -39332 78663 199.4736 2 < 2.2e-16 ***
mlm.3 14 78684 78804 -39328 78656 7.2965 1 0.006909 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The function automatically re-fits every model in the comparison with FIML as opposed to REML, and produces a comparison table for the 3 models.
Quick questions for clarification:
As an additional specification, let’s also try a model that includes Gini as predictor for the intercept at L1.
mlm.4 <- lmer(poleff ~ 1 + age10CWC + femCWC + educCWC +
urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM +
gini10CGM + educCWC * cpiCGM + (1 + educCWC | cnt),
data = df_issp,
control = lmerControl(optimizer = "bobyqa"))
summary(mlm.4)Linear mixed model fit by REML ['lmerMod']
Formula: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC +
inc3CWC + inc4CWC + cpiCGM + gini10CGM + educCWC * cpiCGM +
(1 + educCWC | cnt)
Data: df_issp
Control: lmerControl(optimizer = "bobyqa")
REML criterion at convergence: 78724.3
Scaled residuals:
Min 1Q Median 3Q Max
-4.3103 -0.6728 -0.0112 0.6700 8.7720
Random effects:
Groups Name Variance Std.Dev. Corr
cnt (Intercept) 0.07432 0.2726
educCWC 0.01094 0.1046 -0.29
Residual 0.47063 0.6860
Number of obs: 37628, groups: cnt, 35
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.963145 0.046234 64.089
age10CWC 0.121616 0.007487 16.244
femCWC -0.121016 0.007118 -17.001
educCWC 0.322669 0.019467 16.575
urbanCWC 0.044423 0.007207 6.164
inc2CWC 0.057079 0.008774 6.505
inc3CWC 0.113761 0.008949 12.712
inc4CWC 0.218900 0.009301 23.534
cpiCGM 0.267760 0.106317 2.519
gini10CGM -0.063952 0.103126 -0.620
educCWC:cpiCGM 0.108093 0.038925 2.777
Correlation of Fixed Effects:
(Intr) a10CWC femCWC edcCWC urbCWC in2CWC in3CWC in4CWC cpiCGM
age10CWC 0.000
femCWC 0.000 0.024
educCWC -0.265 0.096 -0.003
urbanCWC 0.000 -0.007 -0.013 -0.057
inc2CWC 0.000 0.025 0.035 -0.039 0.005
inc3CWC 0.000 0.077 0.057 -0.071 0.002 0.515
inc4CWC 0.000 0.078 0.079 -0.115 -0.036 0.510 0.540
cpiCGM 0.000 0.000 0.001 -0.001 0.000 0.001 0.001 0.000
gini10CGM 0.000 0.000 0.001 -0.003 0.000 0.003 0.001 0.000 0.470
edcCWC:cCGM 0.000 -0.009 -0.005 0.001 0.001 0.000 -0.003 -0.007 -0.234
g10CGM
age10CWC
femCWC
educCWC
urbanCWC
inc2CWC
inc3CWC
inc4CWC
cpiCGM
gini10CGM
edcCWC:cCGM 0.006
Is there maybe a differential effect of Gini on education groups, though?
mlm.5 <- lmer(poleff ~ 1 + age10CWC + femCWC + educCWC +
urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM +
gini10CGM + educCWC * cpiCGM + educCWC * gini10CGM +
(1 + educCWC | cnt),
data = df_issp,
control = lmerControl(optimizer = "bobyqa"))
summary(mlm.5)Linear mixed model fit by REML ['lmerMod']
Formula: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC +
inc3CWC + inc4CWC + cpiCGM + gini10CGM + educCWC * cpiCGM +
educCWC * gini10CGM + (1 + educCWC | cnt)
Data: df_issp
Control: lmerControl(optimizer = "bobyqa")
REML criterion at convergence: 78719.7
Scaled residuals:
Min 1Q Median 3Q Max
-4.3072 -0.6730 -0.0110 0.6703 8.7698
Random effects:
Groups Name Variance Std.Dev. Corr
cnt (Intercept) 0.073097 0.27036
educCWC 0.008184 0.09046 -0.27
Residual 0.470626 0.68602
Number of obs: 37628, groups: cnt, 35
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.963145 0.045854 64.621
age10CWC 0.121602 0.007484 16.248
femCWC -0.121114 0.007117 -17.017
educCWC 0.323276 0.017324 18.661
urbanCWC 0.044417 0.007206 6.164
inc2CWC 0.056884 0.008774 6.483
inc3CWC 0.113726 0.008949 12.709
inc4CWC 0.219089 0.009300 23.558
cpiCGM 0.307930 0.106368 2.895
gini10CGM 0.019012 0.106285 0.179
educCWC:cpiCGM 0.046184 0.039708 1.163
educCWC:gini10CGM -0.124201 0.038787 -3.202
Correlation of Fixed Effects:
(Intr) a10CWC femCWC edcCWC urbCWC in2CWC in3CWC in4CWC cpiCGM
age10CWC 0.000
femCWC 0.000 0.024
educCWC -0.234 0.108 -0.003
urbanCWC 0.000 -0.007 -0.013 -0.064
inc2CWC 0.000 0.026 0.035 -0.044 0.005
inc3CWC 0.000 0.077 0.057 -0.080 0.002 0.515
inc4CWC 0.000 0.079 0.079 -0.129 -0.036 0.510 0.540
cpiCGM 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
gini10CGM 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.485
edcCWC:cCGM 0.000 -0.010 -0.003 -0.005 0.001 0.005 0.000 -0.006 -0.236
edCWC:10CGM 0.000 -0.001 0.005 -0.012 0.000 0.011 0.004 0.000 -0.117
g10CGM eCWC:C
age10CWC
femCWC
educCWC
urbanCWC
inc2CWC
inc3CWC
inc4CWC
cpiCGM
gini10CGM
edcCWC:cCGM -0.115
edCWC:10CGM -0.242 0.495
Which one fits the data better?
anova(mlm.3, mlm.4, mlm.5)Data: df_issp
Models:
mlm.3: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + educCWC * cpiCGM + (1 + educCWC | cnt)
mlm.4: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + gini10CGM + educCWC * cpiCGM + (1 + educCWC | cnt)
mlm.5: poleff ~ 1 + age10CWC + femCWC + educCWC + urbanCWC + inc2CWC + inc3CWC + inc4CWC + cpiCGM + gini10CGM + educCWC * cpiCGM + educCWC * gini10CGM + (1 + educCWC | cnt)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
mlm.3 14 78684 78804 -39328 78656
mlm.4 15 78686 78814 -39328 78656 0.3115 1 0.576750
mlm.5 16 78678 78815 -39323 78646 9.7776 1 0.001767 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let’s take Model 5, which seemed to be the most promising of all the specifications tested so far, at least if we judge based on the indices of model fit discussed earlier today and the LRT I conducted above.
Start from examining Level-1 residuals. A custom function for computing the within-group residuals is hlm_resid().
df_resid <- hlm_resid(mlm.5,
level = 1)Look at a plot of residuals vs. the outcome…
ggplot(data = df_resid,
aes(x = poleff,
y = .resid)) +
geom_point() +
theme_clean() +
labs(x = "Political efficacy",
y = "Residual")It’s normal that it’s tilted; getting rid of the tilt requires you to plot residuals against fitted values. Try it and see what comes out.
… as well as residuals against the predictors at L1
ggplot(data = df_resid,
aes(x = age10CWC,
y = .resid)) +
geom_point(alpha = 0.05) +
theme_clean() +
labs(x = "Age (centered, in decades)",
y = "Residual")ggplot(data = df_resid,
aes(x = educCWC,
y = .resid)) +
geom_point() +
theme_clean() +
labs(x = "Education (centered)",
y = "Residual")With dichotomous predictors there is not much chance to diagnose a non-linear trend, so we won’t need to generate these plots.
This is the Bartlett-Kendall test we discussed.
bartlett.test(formula = .resid ~ cnt,
data = df_resid)
Bartlett test of homogeneity of variances
data: .resid by cnt
Bartlett's K-squared = 463.36, df = 34, p-value < 2.2e-16
Unfortunately, we soundly reject \(H_0\), which means that we do NOT have homogeneity of variances of residuals across countries.
df_resid %>%
group_by(cnt) %>%
summarise(VAR = var(.resid)) %>%
ungroup() %>%
slice(1:15) %>%
kable(caption = "Country-by-country variance of residuals",
caption.above = TRUE,
digits = 2,
col.names = c("Country", "Residual variance"))| Country | Residual variance |
|---|---|
| AT | 0.51 |
| AU | 0.46 |
| BE | 0.50 |
| CH | 0.38 |
| CL | 0.40 |
| CZ | 0.52 |
| DE | 0.52 |
| DK | 0.62 |
| EE | 0.38 |
| ES | 0.50 |
| FI | 0.52 |
| FR | 0.34 |
| GB | 0.43 |
| GE | 0.46 |
| HR | 0.42 |
If this was an analysis intended for publishing there would be little reason to advance further, but for didactic purposes we can proceed.
ggplot(df_resid,
aes(sample = .resid)) +
stat_qq() +
stat_qq_line() +
theme_clean()We can do the same types of plots for the Level 2 residuals, only now we obviously only get one residual for each country.
df_resid <- hlm_resid(mlm.5,
level = "cnt") %>%
as.data.frame()ggplot(df_resid,
aes(sample = .ranef.intercept)) +
stat_qq() +
stat_qq_line() +
theme_clean()ggplot(df_resid,
aes(sample = .ranef.educ_cwc)) +
stat_qq() +
stat_qq_line() +
theme_clean()Keep in mind, though, that these are not residuals in the proper sense of the word, as we don’t know what the “true” slope is in a country. Instead, each residual is an assumption about what this slope is, which comes with its own uncertainty interval.
ranef(mlm.5)$cnt
(Intercept) educCWC
AT -0.19875454 0.050986649
AU 0.15786534 -0.068155516
BE -0.12543206 -0.050565005
CH 0.21722101 0.018129943
CL -0.02668506 0.097934989
CZ -0.19839797 0.079336498
DE 0.05371625 0.057589852
DK 0.09637863 -0.016330974
EE -0.19037225 0.004434058
ES -0.11269092 0.163057678
FI -0.09856012 0.091370141
FR 0.46568328 -0.047214007
GB -0.03299656 -0.008672287
GE -0.12039667 0.026217023
HR -0.26846920 -0.020141842
IL -0.23871591 0.051822863
IN -0.03528847 -0.160744238
IS 0.42862890 -0.115941606
JP 0.04141765 -0.144071675
KR -0.19639951 0.019741834
LT -0.36996615 0.013633247
NL 0.14519744 0.094139612
NO 0.34176765 -0.048707860
NZ 0.09730485 -0.115032508
PH 0.15198693 -0.107164681
PL -0.35974976 -0.072951456
RU 0.10291872 0.036281355
SE 0.11884518 -0.001118316
SI -0.48044712 0.054427528
SK -0.24016026 -0.018662586
TR 0.33326551 -0.123082888
TW -0.04422412 0.047898821
US -0.01650215 0.092610027
VE 0.77736894 0.027902632
ZA -0.17535750 0.091042697
with conditional variances for "cnt"
se.ranef(mlm.5)$cnt
(Intercept) educCWC
AT 0.02523909 0.04386562
AU 0.02398895 0.04219476
BE 0.01668830 0.03123589
CH 0.02169577 0.03897322
CL 0.02283165 0.04059436
CZ 0.02037335 0.03702294
DE 0.01780911 0.03305011
DK 0.01670809 0.03126832
EE 0.02997563 0.04966262
ES 0.01989647 0.03630310
FI 0.02207438 0.03951912
FR 0.02662697 0.04565077
GB 0.01889037 0.03475583
GE 0.01985476 0.03623972
HR 0.02674815 0.04580318
IL 0.02558785 0.04432109
IN 0.02109257 0.03809200
IS 0.02488781 0.04340218
JP 0.01992162 0.03634128
KR 0.01858292 0.03427530
LT 0.02515410 0.04375393
NL 0.01871383 0.03448035
NO 0.01956995 0.03580518
NZ 0.02891813 0.04843999
PH 0.01990484 0.03631581
PL 0.01574311 0.02967018
RU 0.02458232 0.04299530
SE 0.02451942 0.04291109
SI 0.02844191 0.04787619
SK 0.02176117 0.03906791
TR 0.02081882 0.03768745
TW 0.01879795 0.03461176
US 0.02021352 0.03678266
VE 0.02243743 0.04003739
ZA 0.01459662 0.02772829
df_resid <- data.frame(cnt = rownames(se.ranef(mlm.5)$cnt),
int = ranef(mlm.5)$cnt$`(Intercept)`,
se.int = se.ranef(mlm.5)$cnt[ ,1],
edu = ranef(mlm.5)$cnt$educCWC,
se.edu = se.ranef(mlm.5)$cnt[ ,2])
df_resid %>%
ggplot(aes(x = reorder(cnt, -int),
y = int)) +
geom_point(size = 3) +
labs(x = "Country",
y = "Residual intercept") +
theme_clean() +
geom_errorbar(aes(ymin = int - 1.96 * se.int,
ymax = int + 1.96 * se.int),
linewidth = 1.25,
width = 0) +
coord_flip()Venezuela might be an outlier here.
df_resid %>%
ggplot(aes(x = reorder(cnt, -edu),
y = edu)) +
geom_point(size = 3) +
labs(x = "Country",
y = "Residual education") +
theme_clean() +
geom_errorbar(aes(ymin = edu - 1.96 * se.edu,
ymax = edu + 1.96 * se.edu),
linewidth = 1.25,
width = 0) +
coord_flip()Things seem fine with the random effects for education.
We can continue with similar plots of predictors against residuals, to diagnose linearity, as well as visually assess constant variance.
Can you quickly extract from the ISSP for each country its value on cpiCGM and gini10CGM? Then please merge these into df_resid as additional columns, and use these for obtaining 2 plots, of predictors on X-axis and residuals on the Y-axis. All of the functions you need to do this have been introduced already.
Load up again the ISSP data, and run a more complete specification at L1. Add to it religious attendance, as well as marital status, plus a quadratic term for age (\(age^2\)). Does this improve things in terms of residual variance across the countries?
At this point, after choosing the best fitting model and making all the diagnostic checks (and adjustments) you need to make, you would report the results from your models. You can present the null model, or leave it for the appendix - your call. You would typically also present 1-2 intermediate specifications (perhaps gradually adding more categories of predictors: attitudinal, institutional etc), and then the final specification. Either in the main body of the paper, or in the appendix, you would also present a few sensitivity checks.
Suppose that we had determined above that Model 5 truly WAS the best that we could do in terms of specification. How to present it?
The standard approach is to display the table of coefficients directly in the paper. In here, the functions available in the texreg package are invaluable, though other packages that offer similar functionality exist as well.
texreg(list(mlm.0, mlm.2, mlm.4, mlm.5),
digits = 3,
custom.model.names = c("Model 1","Model 2","Model 3","Model 4"),
single.row = FALSE,
custom.coef.names = c("(Intercept)", "Age (in decades)", "Gender (woman)",
"Education", "Urban residence", "Income: 2nd quantile",
"Income: 3rd quantile", "Income: 4th quantile",
"Corruption perceptions", "Gini (10-point units)",
"Education * Corruption perceptions",
"Education * Gini"),
booktabs = TRUE,
dcolumn = TRUE,
use.packages = FALSE,
caption = "Comparison of multilevel specifications for political efficacy",
file = "../05-output/04-Table-mlm.tex")htmlreg(list(mlm.0, mlm.2, mlm.4, mlm.5),
digits = 3,
custom.model.names = c("Model 1","Model 2","Model 3","Model 4"),
single.row = FALSE,
custom.coef.names = c("(Intercept)", "Age (in decades)", "Gender (woman)",
"Education", "Urban residence", "Income: 2nd quantile",
"Income: 3rd quantile", "Income: 4th quantile",
"Corruption perceptions", "Gini (10-point units)",
"Education * Corruption perceptions",
"Education * Gini"),
caption = "Comparison of multilevel specifications for political efficacy",
file = "../05-output/04-Table-mlm.html")Alternatively, you could choose to display coefficients as dot-and-whisker plots. Though there are canned functions available, the one I am most familiar with has severe limitations.
dwplot(list(mlm.2, mlm.4, mlm.5),
show_intercept = FALSE) +
theme_clean()This is why I show here the fully manual approach to plotting these quantities, even though it’s a bit long.
First step: tidy up model results and create an indicator variable that identifies the model from which they’re obtained.
mod1tid <- tidy(mlm.2,
effects = "fixed",
conf.int = TRUE,
conf.level = 0.95)
mod1tid <- mod1tid %>%
dplyr::select(-c(effect)) %>%
mutate(model = "model1")
mod2tid <- tidy(mlm.4,
effects = "fixed",
conf.int = TRUE,
conf.level = 0.95)
mod2tid <- mod2tid %>%
dplyr::select(-c(effect)) %>%
mutate(model = "model2")
mod3tid <- tidy(mlm.5,
effects = "fixed",
conf.int = TRUE,
conf.level = 0.95)
mod3tid <- mod3tid %>%
dplyr::select(-c(effect)) %>%
mutate(model = "model3")
df_modframe <- rbind(mod1tid, mod2tid, mod3tid)
rm(mod1tid, mod2tid, mod3tid)Second stage: make variable names nice, and order coefficients based on the order in which they appear in the output of the most complete specification.
df_modframe %<>%
mutate(term = case_when(term %in% "(Intercept)" ~ "(Intercept)",
term %in% "age10CWC" ~ "Age (decades)",
term %in% "as.factor(female)1" ~ "Woman",
term %in% "educCWC" ~ "Education",
term %in% "as.factor(urban)1" ~ "Urban residence",
term %in% "as.factor(incquart)2" ~ "2nd income quartile",
term %in% "as.factor(incquart)3" ~ "3rd income quartile",
term %in% "as.factor(incquart)4" ~ "4th income quartile",
term %in% "cpiCGM" ~ "Perception of corruption",
term %in% "gini10CGM" ~ "Gini (10-points)",
term %in% "educCWC:cpiCGM" ~ "Education * Perceptions",
term %in% "educCWC:gini10CGM" ~ "Education * Gini"))
vec_term <- df_modframe$term[df_modframe$model == "model3"]
df_modframe$term <- factor(df_modframe$term,
levels = vec_term[length(vec_term):1],
ordered = TRUE)
rm(vec_term)Final stage: plot the coefficients.
df_modframe %>%
dplyr::select(-c(std.error, statistic)) %>%
filter(!(term == "(Intercept)")) %>%
ggplot(aes(x = term, y = estimate, color = model)) +
geom_point(size = 3,
position = position_dodge(width = 0.5)) +
geom_errorbar(aes(ymin = conf.low,
ymax = conf.high),
width = 0.,
linewidth = 1.25,
position = position_dodge(width = 0.5)) +
geom_hline(yintercept = 0, linewidth = 1.25, linetype = "dashed",
color = rgb(255, 0, 90, maxColorValue = 255)) +
coord_flip() +
labs(y = "Estimate",
x = "Term") +
theme_clean() +
theme(axis.text.x = element_text(size = 14),
axis.text.y = element_text(size = 14)) +
scale_colour_discrete(name = "Models",
breaks = c("model1", "model2", "model3"),
labels = c("Model 1", "Model 2", "Model 3"))rm(df_modframe)Finally, if your focus is only on one of the effects, you can choose to present it using the standard marginal effects plot. Here as well it’s best to stick to a manual approach.
pl1 <- ggpredict(mlm.5,
terms = "educCWC [-8, -4, 0, 4, 8, 12]",
type = "fe",
ci.lvl = 0.95)
ggplot(pl1,
aes(x = x, y = predicted)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = conf.low,
ymax = conf.high),
width = 0,
linewidth = 1.25) +
labs(y = "Political efficacy",
x = "Education",
title = "") +
theme_clean() +
scale_x_continuous(breaks = c(-8, -4, 0, 4, 8, 12),
labels = c("Very low","Low","Average","Above average",
"High","Very high"))rm(pl1)Alternatively, you can treat the variable as properly continuous.
pl2 <- ggpredict(mlm.5,
terms = "educCWC",
type = "fe",
ci.lvl = 0.95)
ggplot(pl2,
aes(x = x, y = predicted)) +
geom_line(linewidth = 2) +
geom_ribbon(aes(ymin = conf.low,
ymax = conf.high),
alpha = 0.5) +
labs(y = "Political efficacy",
x = "Education",
title = "") +
theme_clean()Package versions used in this script.
sessionInfo()R version 4.2.1 (2022-06-23)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] magrittr_2.0.3 kableExtra_1.3.4 knitr_1.42
[4] dotwhisker_0.7.4 DHARMa_0.4.6 HLMdiag_0.5.0
[7] interplot_0.2.3 abind_1.4-5 ggthemes_4.2.4
[10] broom.mixed_0.2.9.4 arm_1.13-1 lme4_1.1-31
[13] Matrix_1.5-3 MASS_7.3-58.2 texreg_1.38.6
[16] ggeffects_1.1.5 forcats_1.0.0 stringr_1.5.0
[19] dplyr_1.1.0 purrr_1.0.1 readr_2.1.3
[22] tidyr_1.3.0 tibble_3.1.8 ggplot2_3.4.0
[25] tidyverse_1.3.2 pacman_0.5.1
loaded via a namespace (and not attached):
[1] googledrive_2.0.0 minqa_1.2.5 colorspace_2.1-0
[4] ellipsis_0.3.2 sjlabelled_1.2.0 ggstance_0.3.6
[7] snakecase_0.11.0 parameters_0.20.2 fs_1.6.0
[10] rstudioapi_0.14 listenv_0.9.0 furrr_0.3.1
[13] farver_2.1.1 fansi_1.0.4 lubridate_1.9.1
[16] xml2_1.3.3 codetools_0.2-18 splines_4.2.1
[19] jsonlite_1.8.4 nloptr_2.0.3 interactionTest_1.2
[22] broom_1.0.3 dbplyr_2.3.0 compiler_4.2.1
[25] httr_1.4.4 backports_1.4.1 assertthat_0.2.1
[28] fastmap_1.1.0 gargle_1.3.0 cli_3.6.0
[31] htmltools_0.5.4 tools_4.2.1 coda_0.19-4
[34] gtable_0.3.1 glue_1.6.2 reshape2_1.4.4
[37] Rcpp_1.0.10 cellranger_1.1.0 vctrs_0.5.2
[40] svglite_2.1.1 nlme_3.1-161 insight_0.19.0
[43] xfun_0.36 globals_0.16.2 rvest_1.0.3
[46] timechange_0.2.0 lifecycle_1.0.3 diagonals_6.4.0
[49] googlesheets4_1.0.1 future_1.30.0 scales_1.2.1
[52] hms_1.1.2 parallel_4.2.1 yaml_2.3.7
[55] stringi_1.7.12 highr_0.10 bayestestR_0.13.0
[58] boot_1.3-28.1 rlang_1.0.6 pkgconfig_2.0.3
[61] systemfonts_1.0.4 evaluate_0.20 lattice_0.20-45
[64] htmlwidgets_1.6.1 labeling_0.4.2 tidyselect_1.2.0
[67] parallelly_1.34.0 plyr_1.8.8 R6_2.5.1
[70] generics_0.1.3 DBI_1.1.3 pillar_1.8.1
[73] haven_2.5.1 withr_2.5.0 mgcv_1.8-41
[76] datawizard_0.6.5 janitor_2.1.0 modelr_0.1.10
[79] crayon_1.5.2 utf8_1.2.2 tzdb_0.3.0
[82] rmarkdown_2.20 grid_4.2.1 readxl_1.4.1
[85] reprex_2.0.2 digest_0.6.31 webshot_0.5.4
[88] munsell_0.5.0 viridisLite_0.4.1